Compile-Time Loop Splitting for Distributed Memory Multiprocessors
نویسنده
چکیده
In a distributed memory multiprocessor, a program’s task is partitioned among the processors to exploit parallelism, and the data are partitioned to increase referential locality. Though the purpose of partitioning is to shorten the execution time of an algorithm, each data reference can become a complex expression based upon the data partitions. As an attempt to minimize the computation needed for array references, loop splitting can further divide a partitioned loop into segments that allow the code hoisting and strength reduction optimizations. This thesis introduces two methods of loop splitting, rational and interval. While rational splitting divides the loop into equal-length GCD segments, interval splitting specifies segments as an explicit list of intervals. These two methods have been implemented and studied. Under our execution model, the loop in the algorithms analyzed executes an average of 2 to 3 times faster after loop splitting.
منابع مشابه
Access Descriptor Based Locality Analysis for Distributed-Shared Memory Multiprocessors
Most of today’s multiprocessors have a DistributedShared Memory (DSM) organization, which enables scalability while retaining the convenience of the shared-memory programming paradigm. Data locality is crucial for performance in DSM machines, due to the difference in access times between local and remote memories. In this paper, we present a compile-time representation that captures the memory ...
متن کاملHardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors
Run-time parallelization is often the only way to execute the code in parallel when data dependence information is incomplete at compile time. This situation is common in many important applications. Unfortunately, known techniques for run-time parallelization are often computationally expensive or not general enough. To address this problem, we propose new hardware support for e cient run-time...
متن کاملCombining Dynamic and Static Scheduling on Distributed-Memory Multiprocessors
Loops are a large source of parallelism for many numerical applications. An important issue in the parallel execution of loops is how to schedule them so that the work-load is well balanced among the processors. Most existing loop scheduling algorithms were designed for shared-memory multiprocessors, with uniform memory access costs. These approaches are not suitable for distributed-memory mult...
متن کاملCompiler Techniques for Software Prefetching on Cache-Coherent Shared-Memory Multiprocessors
This document describes a set of new techniques for improving the eeciency of compiler-directed software prefetching for parallel Fortran programs running on cache-coherent DSM (distributed shared memory) multiprocessors. The key component used in this scheme is a data ow framework that exploits information about array access patterns and about the cache coherence protocol to predict at compile...
متن کاملExtended Abstract: "No-Compile-Time Knowledge" Distribution of Finite Element Computations on Multiprocessors
This paper addresses partitioning and scheduling of irregular loops arising in finite element computations on unstructured meshes. Target computers are Distributed Memory Parallel Computers that provide a global address space. We introduce the concept of “)conditioned Iterations Loop” which distributes the iterations dynamically according to a runtime condition. This technique is improved by a ...
متن کامل